Unsupervised learning of multi-word verbs∗
نویسندگان
چکیده
Collocation is a linguistic phenomenon that is difficult to define and harder to explain; it has been largely overlooked in the field of computational linguistics due to its difficulty. Although standard techniques exist for finding collocations, they tend to be rather noisy and suffer from sparse data problems. In this paper, we demonstrate that by utilising parsed input to concentrate on one very specific type of collocation—in this case, verbs with particles, a subset of the socalled “multi-word” verbs—and applying an algorithm to promote those collocations in which we have more confidence, the problems with statistically learning collocations can be overcome.
منابع مشابه
An Unsupervised Verb Class Disambiguation
We present an unsupervised learning method for disambiguating verbs that belong to more than one Levin verb class (1993) when occurring in a particular syntactic frame. We used examples that contain unambiguous verbs in each verb class as the training data for ambiguous verbs in that class. A Naive Bayesian classifier was employed for the disambiguation task using context words as features. Our...
متن کاملUnsupervised Verb Inference from Nouns Crossing Root Boundary
Inference about whether a word in one text has similar meaning to another word in the other text is an essential task in order to understand whether two texts have similar meaning. However, this inference becomes difficult especially when two words do not share a lexical root, do not have the same argument structure, or do not have the same part-of-speech. This paper presents an unsupervised ap...
متن کاملDistributional Semantics Approach to Thai Word Sense Disambiguation
Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy...
متن کاملAnalysis of functional similarities of Finnish verbs using the self-organizing map
Obtaining semantic or functional word categories from data in an unsupervised manner is a problem motivated both from the linguistic point of view and from that of construing language models for various language processing tasks. In this work, we use the Self-Organizing Map algorithm to visualize and cluster common Finnish verbs based on their immediate morphological contexts. Based on a data s...
متن کاملSEMANTIC CLUSTERING OF VERBS Analysis of Morphosyntactic Contexts Using the SOM Algorithm
Obtaining semantic or functional word categories from data in an unsupervised manner is a problem motivated both from the linguistic point of view and from that of construing language models for various language processing tasks. In this work, we use the self-organizing map algorithm to visualize and cluster common Finnish verbs based on functional and semantic information coded by case marking...
متن کامل